{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### web图片爬取"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<img src=\"https://gzfc.evergrande.com/UploadFile/EditorUpload/20191202174605_1689.jpg\" alt=\"\">"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from IPython.core.display import display,HTML   #使用ipython展示模块\n",
    "display(HTML('<img src=\"https://gzfc.evergrande.com/UploadFile/EditorUpload/20191202174605_1689.jpg\" alt=\"\">'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 使用session"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "####   session会话的用法\n",
    "很多学习Python的同学们可能都知道requests库的用法，但是不知道的是还有另外一种用法：requests.session\n",
    "session其实是一个会话类，requests的所有请求方法，底层都是调用的这个类的对象。\n",
    "####    其他方法和session的区别在于：\n",
    "* 直接使用requests调用请求方法发送请求，每次都会创建一个新的session（会话对象），所以没有之前请求的cookies信息\n",
    "* 直接创建一个session对象来发请求，那么每次发请求用的都是这个会话对象，所有能够保存之前的会话信息（cookies数据\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 利用session爬取新闻信息"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### 使用find标签"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* find() 接收通过css选择器返回element对象和element对象组成的列表\n",
    "* first参数被置为True， 则只返回找到的第一个Element对象\n",
    "* CSS选择器示例：\n",
    "   * a\n",
    "   * a.someClass\n",
    "   * a#someID\n",
    "   * a[target=_blank]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "12月1日，2019赛季中超收官战，广州恒大淘宝队主场3:0击败上海绿地申花。上半场韦世豪补时破门，下半场朴志洙再下一城，艾克森锁定胜局。最终，广州队以23胜3平4负积72分的佳绩勇夺中超冠军，成为中超联赛史上第一支“八冠王”球队！\n",
      "自2010年进入足球以来，恒大9年共获17项顶级赛事冠军，创造了中国职业足球前所未有的辉煌战绩。此次问鼎中超“八冠王”，也是继中超七连冠、亚冠双冠后，恒大队开创的又一个新的里程碑。\n",
      "赛后，广东省体育局、广州市体育局第一时间发出贺信，祝贺广州恒大淘宝队成功夺得2019中超联赛冠军。\n",
      "\n",
      "\n",
      "实施新老交替，上演“青春风暴”\n",
      "2019年初，广州恒大淘宝俱乐部引进韦世豪、高准翼、严鼎皓等年轻球员，与原有的杨立瑜、钟义浩等人一起，强力实施一线队新老交替，全队平均年龄从30岁下降到27岁，面貌焕然一新。\n",
      "在俱乐部2019赛季管理会议上，恒大老板许家印明确要求，新赛季要大浪淘沙、优胜劣汰，给年轻球员提供更多的位置和机会，加快培养更多更优秀的年轻球员。赛季开始后，在塔利斯卡、郜林等中外核心球员长时间因伤缺阵，球队落后领头羊多达8分的逆境中，主教练卡纳瓦罗带领韦世豪、杨立瑜、高准翼、严鼎皓、钟义浩等一批年轻球员挑起大梁、奋起直追，以一波不可思议的13连胜创下队史纪录并追平中超最长连胜纪录，一举占据积分榜首，此后一直领跑12轮直至最终夺冠。截至目前，杨立瑜以12次助攻排名中超本土助攻榜第一，韦世豪以11个进球名列本土射手榜第一。\n",
      "恒大队本赛季成功“换血”并夺冠，为其续写辉煌奠定了坚实基础。\n",
      "\n",
      "\n",
      "进一步从严管理，铸就强大战斗力\n",
      "恒大企业文化的核心是“要么不做，要做就做最好”，通过严格管理、奖罚分明，铸就强大凝聚力和战斗力。恒大淘宝俱乐部从2010年成立至今，始终贯彻恒大企业文化，从严管理、重奖重罚，打造了冠绝中超的俱乐部管理风格。\n",
      "2019赛季，恒大俱乐部大力实施“大浪淘沙、优胜劣汰”的竞争机制，督促老队员不吃老本焕发第二春，激发年轻球员抓住机遇、快速成长，成为球队的中流砥柱。同时，俱乐部修订出台“三九”队规，持续执行“恒大国脚八项规定”，一如既往地对教练组和球员上紧发条，鞭策三军用命、奋勇拼搏。\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# 建立session会话\n",
    "from requests_html import HTMLSession\n",
    "session =HTMLSession()\n",
    "# 使用requests库发起get请求，括号内（）填写需要爬取内容的地址\n",
    "r =session.get(\"https://gzfc.evergrande.com/news.aspx?ftid=48396\")\n",
    "news = r.html.find('.NewsCont>p')#使用css选择器选择属性NewCont下面的p标签所有内容\n",
    "\n",
    "#结果输出：\n",
    "for new in news:\n",
    "    print(new.text)#获得新闻内容\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###### 渲染出一个Element对象的HTML内容:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<p style=\"text-align:center;\">\n",
      "<img src=\"/UploadFile/EditorUpload/20191202174415_3905.jpg\" alt=\"\" />\n",
      "</p>\n"
     ]
    }
   ],
   "source": [
    "# 建立session会话\n",
    "from requests_html import HTMLSession\n",
    "session =HTMLSession()\n",
    "# 使用requests库发起get请求，括号内（）填写需要爬取内容的地址\n",
    "r =session.get(\"https://gzfc.evergrande.com/news.aspx?ftid=48396\")\n",
    "news = r.html.find('.NewsCont>p',first=True)#使用css选择器选择属性NewCont下面的第一个p标签的内容\n",
    "#结果输出：\n",
    "print(new.html)#获得p标签的HTML内容"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###### 获取Element对象内的特定子Element对象，返回列表:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[<Element 'img' src='imgaes/English.png'>,\n",
       " <Element 'img' src='imgaes/search_botton.png' onclick='searchClick()'>,\n",
       " <Element 'img' src='imgaes/biao.jpg'>,\n",
       " <Element 'img' src='imgaes/News/title.png'>,\n",
       " <Element 'img' src='/UploadFile/EditorUpload/20191202174605_1689.jpg' alt=''>,\n",
       " <Element 'img' src='/UploadFile/EditorUpload/20191202174301_0604.jpg' alt=''>,\n",
       " <Element 'img' src='/UploadFile/EditorUpload/20191202174415_3905.jpg' alt=''>,\n",
       " <Element 'img' src='imgaes/News/daying.png'>,\n",
       " <Element 'img' src='imgaes/News/Email.png'>,\n",
       " <Element 'img' src='imgaes/index/s.png'>,\n",
       " <Element 'img' src='UploadFile\\\\2020-03\\\\bf5de705-ffec-4927-9c90-e52227616dd1.jpg' width='142' height='93'>,\n",
       " <Element 'img' src='UploadFile\\\\2020-02\\\\d2212336-c3ab-42df-8271-3a0e165c1aee.jpg' width='142' height='93'>,\n",
       " <Element 'img' src='UploadFile\\\\2020-02\\\\a207b339-a9f5-428d-b03d-5b86af0904bf.jpg' width='142' height='93'>,\n",
       " <Element 'img' src='UploadFile\\\\2020-02\\\\e5a2cfd9-a636-4d88-bc2d-b26cc8ec03ac.jpg' width='142' height='93'>,\n",
       " <Element 'img' src='UploadFile\\\\2020-02\\\\5ff26787-11d7-4245-8e0e-5a0e6b8a1b08.jpg' width='142' height='93'>,\n",
       " <Element 'img' src='UploadFile\\\\2019-08\\\\eb4168b1-0b3b-4096-aa38-ed138b1ce694.png' title='恒大集团'>,\n",
       " <Element 'img' src='UploadFile\\\\2019-08\\\\342fb2f4-319b-4c43-a36c-419b7ab29c72.png' title='恒大地产集团'>,\n",
       " <Element 'img' src='imgaes/link/link_04.png' title='NIKE'>,\n",
       " <Element 'img' src='imgaes/link/link_03.png' title='21CN'>]"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 建立session会话\n",
    "from requests_html import HTMLSession\n",
    "session =HTMLSession()\n",
    "# 使用requests库发起get请求，括号内（）填写需要爬取内容的地址\n",
    "r =session.get(\"https://gzfc.evergrande.com/news.aspx?ftid=48396\")\n",
    "news = r.html.find('img')#使用css选择器选择属性NewCont下面的所有img标签的内容\n",
    "#结果输出：\n",
    "news          #输出为一个列表，内含该网页所有img标签的对象\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###### 获取到只包含某些文本的Element对象"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<Element 'a' href='http://www.baidu.com/link?url=nf4Ze8P42oANi6sG2K2m2XQy74oi6BuLOS_UheVcuiTJOCyPkIF1xLKSgxzJYnHgV3n_OWWQFI8C9_qhrw5WqPA-tuKfiWJFLMLbOiwjGP9OKFKj_OJcNJk-3zeFT8HCMTv53TAL97jwYtYdT2d2D8x50YpxkEFuDcar5NbKvEDkZfwc8qwlz7OkqmPPmGMLws4N7AbU5vqvfVTYKjMPva' target='_blank' data-url='https://baike.baidu.com/item/%E5%B9%BF%E5%B7%9E%E6%81%92%E5%A4%A7%E6%B7%98%E5%AE%9D%E8%B6%B3%E7%90%83%E4%BF%B1%E4%B9%90%E9%83%A8/14680150' data-visited='off' data-xcx='false' class=('c-blocka', 'WA_LOG_SF') data-a-148a6260=''>\n",
      "广州市恒大淘宝足球俱乐部（Guangzhou Evergrande Taobao FC）是中国广州的一所职业足球俱乐部，现参加中国足球协会超级联赛。2011赛季启用广州天河体育场作为主场。广州恒大淘宝足球俱乐部前身是成立于1954年的广州市足球队。1993年1月，广州市足球队通过和太阳神集团合作，成为中国第一家政府与企业合股的职业足球俱乐部。2010年3月1日，恒大集团买断球队全部股权，俱乐部更名为广州恒大足球俱乐部。2012年俱乐部首次参加亚洲足球俱乐部冠军联赛并进入八强，2013年获得亚洲足球俱乐部冠军联赛冠军，这也是中国足球俱乐部第一次问鼎该项赛事的冠军，同年获亚足联最佳俱乐部奖。2014年6月5日，阿里巴巴入股恒大俱乐部50%的股权；同年7月4日俱乐部更名为广州恒大淘宝足球俱乐部。2015年11月6日，恒大淘宝正式上市，登陆新三板，成为亚洲足球第一股。2017年11月9日，广州恒大...\n",
      "<Element 'a' target='_blank' href='http://www.baidu.com/link?url=G4XycwW6rJ4TRBKW8vfP9fV_ywxlq_iweUDDRQEK3QHjrXCF8S6Wwr0MOnyQ3htJeH_1NaO0sWB6lBXx0KnKiK' class=('c-showurl',) style='text-decoration:none;'>\n",
      "https://gzfc.evergrande.com/de...\n",
      "<Element 'a' target='_blank' href='http://www.baidu.com/link?url=mPags0mF3HaiqE9hBdmqMMnoJnLiFSVmPENqtdwYO3g3mF3VBuUI4_4uwzwvo5K9' class=('c-showurl',) style='text-decoration:none;'>\n",
      "https://www.gzevergrandefc.com/\n"
     ]
    }
   ],
   "source": [
    "from requests_html import HTMLSession\n",
    "session =HTMLSession()\n",
    "r =session.get(\"https://www.baidu.com/s?wd=%E5%B9%BF%E5%B7%9E%E6%81%92%E5%A4%A7%E8%B6%B3%E7%90%83%E4%BF%B1%E4%B9%90%E9%83%A8&rsv_spt=1&rsv_iqid=0xd8e35245000243a9&issp=1&f=3&rsv_bp=1&rsv_idx=2&ie=utf-8&tn=baiduhome_pg&rsv_enter=1&rsv_dl=ts_0&rsv_sug3=27&rsv_sug1=17&rsv_sug7=100&rsv_sug2=0&prefixsug=%25E5%25B9%25BF%25E5%25B7%259E%25E6%2581%2592%25E5%25A4%25A7&rsp=0&inputT=11816&rsv_sug4=13321\")\n",
    "report =r.html.find(\"a\",containing=\"evergrande\")#查找a标签，设置关键字evergrande\n",
    "#结果输出：\n",
    "for reports in report:\n",
    "    print(reports)\n",
    "    print(reports.text)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### 使用link标签\n",
    "获取本页面所有的链接并返回一个列表, 保留了url在页面中原本的形式（已经自动去掉了html标签）:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'about.aspx?ftid=298',\n",
       " 'data.aspx',\n",
       " 'default.aspx',\n",
       " 'english/default.aspx',\n",
       " 'http://bbs.gzevergrandefc.com/',\n",
       " 'http://gzevergrandetaobaofc.tmall.com/',\n",
       " 'http://hd.21cn.com/',\n",
       " 'http://webscan.360.cn/index/checkwebsite/url/www.gzevergrandefc.com',\n",
       " 'http://www.acmilan.com/it',\n",
       " 'http://www.beian.miit.gov.cn',\n",
       " 'http://www.evergrande.com/',\n",
       " 'http://www.nike.com/cn/zh_cn/',\n",
       " 'http://www.realmadrid.com/cs/Satellite/es/Prehome_ES2.htm',\n",
       " 'http://www.reenoo.com',\n",
       " 'https://gzfc-ticket.evergrande.com/',\n",
       " 'https://www.vcg.com/topic/100079',\n",
       " 'news.aspx',\n",
       " 'news.aspx?fid=138',\n",
       " 'news.aspx?ftid=48379',\n",
       " 'news.aspx?ftid=48384',\n",
       " 'news.aspx?ftid=48388',\n",
       " 'news.aspx?ftid=48392',\n",
       " 'news.aspx?ftid=48396',\n",
       " 'news.aspx?ftid=48403',\n",
       " 'news.aspx?ftid=48411',\n",
       " 'news.aspx?ftid=48413',\n",
       " 'photos.aspx?tid=718',\n",
       " 'talent.aspx',\n",
       " 'team.aspx',\n",
       " 'video.aspx',\n",
       " 'video.aspx?ftid=48400'}"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#建立session会话\n",
    "from requests_html import HTMLSession\n",
    "session =HTMLSession()\n",
    "r =session.get(\"https://gzfc.evergrande.com/default.aspx\")   #发送get请求\n",
    "href = r.html.links   #获取该页面的所有的链接\n",
    "href"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### 使用absolute_links\n",
    "获取本页面所有的链接并返回一个列表, 自动将url转换为绝对路径形式（已经自动去掉了html标签）:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'http://bbs.gzevergrandefc.com/',\n",
       " 'http://gzevergrandetaobaofc.tmall.com/',\n",
       " 'http://hd.21cn.com/',\n",
       " 'http://webscan.360.cn/index/checkwebsite/url/www.gzevergrandefc.com',\n",
       " 'http://www.acmilan.com/it',\n",
       " 'http://www.beian.miit.gov.cn',\n",
       " 'http://www.evergrande.com/',\n",
       " 'http://www.nike.com/cn/zh_cn/',\n",
       " 'http://www.realmadrid.com/cs/Satellite/es/Prehome_ES2.htm',\n",
       " 'http://www.reenoo.com',\n",
       " 'https://gzfc-ticket.evergrande.com/',\n",
       " 'https://gzfc.evergrande.com/about.aspx?ftid=298',\n",
       " 'https://gzfc.evergrande.com/data.aspx',\n",
       " 'https://gzfc.evergrande.com/default.aspx',\n",
       " 'https://gzfc.evergrande.com/english/default.aspx',\n",
       " 'https://gzfc.evergrande.com/news.aspx',\n",
       " 'https://gzfc.evergrande.com/news.aspx?fid=138',\n",
       " 'https://gzfc.evergrande.com/news.aspx?ftid=48379',\n",
       " 'https://gzfc.evergrande.com/news.aspx?ftid=48384',\n",
       " 'https://gzfc.evergrande.com/news.aspx?ftid=48388',\n",
       " 'https://gzfc.evergrande.com/news.aspx?ftid=48392',\n",
       " 'https://gzfc.evergrande.com/news.aspx?ftid=48396',\n",
       " 'https://gzfc.evergrande.com/news.aspx?ftid=48403',\n",
       " 'https://gzfc.evergrande.com/news.aspx?ftid=48411',\n",
       " 'https://gzfc.evergrande.com/news.aspx?ftid=48413',\n",
       " 'https://gzfc.evergrande.com/photos.aspx?tid=718',\n",
       " 'https://gzfc.evergrande.com/talent.aspx',\n",
       " 'https://gzfc.evergrande.com/team.aspx',\n",
       " 'https://gzfc.evergrande.com/video.aspx',\n",
       " 'https://gzfc.evergrande.com/video.aspx?ftid=48400',\n",
       " 'https://www.vcg.com/topic/100079'}"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#建立session会话\n",
    "from requests_html import HTMLSession\n",
    "session =HTMLSession()\n",
    "r =session.get(\"https://gzfc.evergrande.com/default.aspx\")   #发送get请求\n",
    "all = r.html.absolute_links   #获取该页面的所有的链接\n",
    "all"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 更加复杂的css选择器"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'中国广东广州天河区黄埔大道西78号恒大中心22楼 广州恒大淘宝足球俱乐部 邮编：510620 广州恒大淘宝足球俱乐部\\n邮箱：gzevergrandefc@sina.com'"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from requests_html import HTMLSession\n",
    "session =HTMLSession()\n",
    "r = session.get('https://gzfc.evergrande.com/news.aspx?ftid=48396')\n",
    "sel =\".foot>.foot_bottom>ul>li>a\"#利用css子元素选择器，使用>逐层逐层选取需要的元素\n",
    "\n",
    "place =r.html.find(sel,first=True)\n",
    "place.text"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "####  使用xpath爬取信息"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "王者归来 八星恒大 {'https://gzfc.evergrande.com/news.aspx?ftid=48396'}\n",
      "保利尼奥双响塔利斯卡建功，广州恒大淘宝客场3:1战胜河北华夏幸福 {'https://gzfc.evergrande.com/news.aspx?ftid=48392'}\n",
      "塔利斯卡梅开二度，广州恒大淘宝2:0战胜上海上港领跑积分榜 {'https://gzfc.evergrande.com/news.aspx?ftid=48388'}\n",
      "恒大青训再添硕果——恒大二队勇夺青超U19A组冠军 {'https://gzfc.evergrande.com/news.aspx?ftid=48384'}\n",
      "广州恒大淘宝2:2战平河南建业，韦世豪独造两球，保利尼奥点射 {'https://gzfc.evergrande.com/news.aspx?ftid=48379'}\n"
     ]
    }
   ],
   "source": [
    "from requests_html import HTMLSession\n",
    "session = HTMLSession()\n",
    "r = session.get(\"https://gzfc.evergrande.com/default.aspx\")\n",
    "news=r.html.xpath(\"//div[@class='Newlist']/ul/li/a\")#选取所有div元素，且class属性为Newlist，里面的ul元素中的li元素中的a标签\n",
    "for new in news:\n",
    "    print(new.text,new.absolute_links)#输出文字和绝对路径\n",
    "\n",
    "    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2020.03.13关于对球员费南多的重大处罚决定 {'https://gzfc.evergrande.com/news.aspx?fid=138&ftid=48413'}\n",
      "2020.02.27公 告 {'https://gzfc.evergrande.com/news.aspx?fid=138&ftid=48411'}\n",
      "2020.01.15广州恒大淘宝足球俱乐部2020赛季会员卡（套票）充值续费公告 {'https://gzfc.evergrande.com/news.aspx?fid=138&ftid=48403'}\n",
      "2019.12.25关于2020赛季会员卡（套票）充值续费资格及积分查询工作的通知 {'https://gzfc.evergrande.com/news.aspx?fid=138&ftid=48399'}\n",
      "2019.12.24关于下发《2020赛季会员卡（套票）充值续费管理办法》 {'https://gzfc.evergrande.com/news.aspx?fid=138&ftid=48398'}\n",
      "2019.11.30关于广州恒大淘宝VS上海绿地申花的球票售罄公告 {'https://gzfc.evergrande.com/news.aspx?fid=138&ftid=48393'}\n",
      "2019.11.27广州恒大淘宝足球俱乐部2019赛季中超联赛12月1日主场球票销售公告 {'https://gzfc.evergrande.com/news.aspx?fid=138&ftid=48390'}\n",
      "2019.11.22关于广州恒大淘宝VS上海上港的球票售罄公告 {'https://gzfc.evergrande.com/news.aspx?fid=138&ftid=48386'}\n",
      "2019.11.19广州恒大淘宝足球俱乐部2019赛季中超联赛11月23日主场球票销售公告 {'https://gzfc.evergrande.com/news.aspx?fid=138&ftid=48385'}\n",
      "2019.11.03公 告 {'https://gzfc.evergrande.com/news.aspx?fid=138&ftid=48383'}\n",
      "2019.10.27公 告 {'https://gzfc.evergrande.com/news.aspx?fid=138&ftid=48378'}\n",
      "2019.10.23广州恒大淘宝足球俱乐部2019赛季中超联赛10月27日主场球票销售公告 {'https://gzfc.evergrande.com/news.aspx?fid=138&ftid=48374'}\n",
      "2019.10.18关于对球员韦世豪、杨立瑜的处罚决定 {'https://gzfc.evergrande.com/news.aspx?fid=138&ftid=48370'}\n",
      "2019.10.17广州恒大淘宝足球俱乐部2019赛季亚冠半决赛次回合10月23日主场球票销售公告 {'https://gzfc.evergrande.com/news.aspx?fid=138&ftid=48368'}\n",
      "2019.10.10关于广州恒大淘宝足球训练基地十月份开放日的公告 {'https://gzfc.evergrande.com/news.aspx?fid=138&ftid=48367'}\n"
     ]
    }
   ],
   "source": [
    "\n",
    "from requests_html import HTMLSession\n",
    "session =HTMLSession()\n",
    "r = session.get(\"https://gzfc.evergrande.com/news.aspx?fid=138\")\n",
    "jifenbiao =r.html.xpath(\"//div[@class='Nlist']/ul/li\")\n",
    "for news in jifenbiao:\n",
    "    print(news.text,news.absolute_links)\n",
    "    \n",
    "    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "ename": "SyntaxError",
     "evalue": "invalid syntax (<ipython-input-32-556cbaf5cd32>, line 4)",
     "output_type": "error",
     "traceback": [
      "\u001b[1;36m  File \u001b[1;32m\"<ipython-input-32-556cbaf5cd32>\"\u001b[1;36m, line \u001b[1;32m4\u001b[0m\n\u001b[1;33m    JJ = r.html.xpath(\"/*[@id=\"1\"]/h3/a\")\u001b[0m\n\u001b[1;37m                               ^\u001b[0m\n\u001b[1;31mSyntaxError\u001b[0m\u001b[1;31m:\u001b[0m invalid syntax\n"
     ]
    }
   ],
   "source": [
    "from requests_html import HTMLSession\n",
    "session = HTMLSession()\n",
    "r = session.get(\"https://www.baidu.com/s?wd=%E6%9E%97%E4%BF%8A%E6%9D%B0&rsv_spt=1&rsv_iqid=0xdd4d3e0a000d45d4&issp=1&f=8&rsv_bp=1&rsv_idx=2&ie=utf-8&tn=baiduhome_pg&rsv_enter=1&rsv_dl=tb&rsv_sug3=11&rsv_sug1=5&rsv_sug7=100&rsv_sug2=0&inputT=1742&rsv_sug4=2371\")\n",
    "JJ = r.html.xpath(\"/*[@id=\"1\"]/h3/a\")\n",
    "print(JJ)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {
    "height": "calc(100% - 180px)",
    "left": "10px",
    "top": "150px",
    "width": "199.333px"
   },
   "toc_section_display": true,
   "toc_window_display": true
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
